Zoltan Konyha, VRVis, konyha@vrvis.at
[PRIMARY contact]
Kresimir Matkovic, VRVis, matkovic@vrvis.at
Wolfgang Freiler, VRVis, freiler@vrvis.at
Denis Gracanin, Virginia
Tech, gracanin@vt.edu
Ranko Miklin , University of
Zagreb, r.miklin@gmail.com
Tomislav Lipic, University
of Zagreb, tomislav.lipic@fer.hr
Mario Beric, University of
Zagreb, Mario.Beric@fer.hr
Student
team: NO
We have used
ComVis (http://www.comvis.at) in our analysis.
ComVis is an interactive visualization application using multiple linked views.
It offers numerous types of linked views for scalar, categorical and time
series data. It also supports composite brushing. Brushes defined in the same
view or in different views can be composited using boolean operators. The
brushed set of items can be displayed in tabular format, too. We have used a
simple Python script to compute various aggregates from the data set, including
sets of contacts for each person and the Python package NetworkX (https://networkx.lanl.gov/) to get an
overview of the network graph properties.
Two Page
Summary: YES
VRVis-ComVis-Phone-Summary.pdf
ANSWERS:
Phone-1: What is the Catalano/Vidro social
network, as reflected in the cell phone call data, at the end of the time
period
Phone-2 Characterize the changes in
the Catalano/Vidro social structure over the ten day period.
Detailed Answer:
In short, the
five persons mentioned in the challenge started using new phone numbers after
day seven and they moved to different locations. This video.captures a part of our
analysis.
Analysis with
multiple linked views
The set of
linked views shown in Figure 1 provide an overview of
the caller, the callee, the date and time of the call and the location of the
caller. Each point in the 20x20 matrices represents callers (top left) and
callees (top right). In the bottom left, a scatter plot of the
latitude/longitude coordinates of the towers provides a map. In the bottom
right, a scatter plot of days (horizontal axis) and time (vertical axis)
provides a calendar.
Figure 1: Contacts of ID200. |
Figure 2: Phone calls between IDs 200 and 5. |
The description
of the mini challenge says that we have medium confidence that Ferdinando
Catalano is identifier 200 and he would call Estaban Catalano most frequently.
We brushed ID200 in the caller matrix in Figure 1.
The linked callee matrix and the detail table indicate that he called IDs 1, 2,
3, 5, 97 and 137.
We brushed each
of the highlighted callees. The logical AND of the two brushes selects calls
from ID200 to the individual callees. The highlighted points in the calendar
show when the calls were made. A snapshot of this process is shown in Figure 2. ID200 calls ID5 the most often (once every
day in the first days). There are also seven calls from ID5 to ID200 with a
similar temporal pattern. We concluded that ID5 is Estaban Catalano.
ID200 did not
make any calls to 1, 2, 3 and 5 on the last three days. We checked if anyone
else did. The logical OR combination of the two brushes in the callee matrix in
Figure 3 selects records where those numbers were
called. All but one of those calls were made in the first seven days. We also
noticed that they made no calls on days 8 and 9.
Figure 3: Calls to IDs 1, 2, 3 and 5. |
Figure 4: Number of people calling contacts of ID200. |
What we learned
from aggregates
Using a Python
script, we computed aggregates for each of the 400 persons, including the
number of incoming and outgoing calls, number of IDs called by the person,
number of IDs calling the person and time series that indicates the number of
phone calls in each hour. The time series gave us some valuable insight into
the temporal patterns, which is discussed in the summary.
In Figure 4 we added a scatter plot showing the number
of people calling a given person (top right) and one showing the number of
people called by a given person (bottom right). We brushed the six contacts of
ID200. The top right scatter plot and the detail table show that IDs 1 and 5
have received calls from many people. This is typical for someone coordinating
a network. We know that ID5 is Estaban Catalano. Therefore we assume that ID1 is David Vidro. IDs 2 and 3 also
received calls from many people, while 97 and 137 have fewer contacts. IDs 2 and 3 are Juan Vidro and Jorge Vidro,
but we cannot decide which is which.
Figure 5: ID0 talks to many people. |
Figure 6: IDs 306, 309, 360 and 397 have many
contacts, too. |
Figure 5 shows that ID0 called the most people and
received calls from many different people. ID0 is an important node in the
network, but we do not know the associated name. In Figure
6 we have brushed four more persons who received calls from many different
people. The detail view displays their IDs: 306, 309, 360 and 397. They made
calls in the last three days only.
Figure 7: People who had many contacts in the first
seven days. |
Figure 8: People who had many contacts in the last
three days. |
We suspect that
something changed after day seven.
We created separate aggregates for the first seven and the last three days. In Figure 7 we brushed IDs that had received
calls from many contacts only in the first seven days, but not in the last three.
They are IDs 1, 2, 3 and 5. Conversely, Figure 8
shows that IDs 300, 306, 309, 360 and 397 were called by many people in the
last three days but not in the first seven.
Figure 9: Histogram of the number of common contacts. |
We
computed the common contacts for each pairs of the IDs 0, 1, 2, 3, 5, 13,
200, 300, 306, 309, 360 and 397. In Figure 9,
we can see that the pairs 1 and 309, 2 and 397, 3 and 360, 5 and 306 have
many common contacts. One ID in each pair was active on the first seven days
while the other one was active on the last three. We suspect that after day
seven the persons using IDs 1, 2, 3
and 5 started using the numbers 309, 397, 360 and 306, respectively. |
The only common
contacts of IDs 1, 2, 3 and 5 are 0 and 200. ID0 has mostly the same partners
before and after day seven, thus we assume it belongs to the same person. IDs 306,
309, 360 and 397 are the same four people as IDs 1, 2, 3 and 5. Their only
common contact is ID300 who also becomes active on the last three days only.
Therefore, we assume that ID200 became
ID300 after day seven. IDs 1, 2, 3, 5 and 200 talk to people in the last
three days they have not (often) talked to before, therefore we assume that
different people started using those phones. The following table summarizes the
changes in the network:
Name |
ID on days 1-7 |
ID on days 8-10 |
Ferdinando
Catalano |
200 |
300 |
Estaban
Catalano |
5 |
306 |
David
Vidro |
1 |
309 |
Jorge
Vidro or Juan Vidro |
2 |
397 |
Jorge
Vidro or Juan Vidro |
3 |
360 |
We studied the locations
of towers those ten numbers were calling from to get an idea of the
geographical extents of the movement. In general, we found that in the first
seven days they stayed mostly near towers 11 and 29 in the city in the middle
of the island and near tower 30 in the north of the island. After day seven,
some of them moved to the south of the island. The following table provides
details of the locations of callers.
ID |
Calling from tower |
0 |
From
tower 7and in the evenings from 21 |
1 |
11 and 29 |
2 |
Mostly
from 29, one call from 11 |
3 |
Mostly
from 30, few from 10 |
5 |
Mostly
from 30, few from 29 |
200 |
Mostly
from 29, few from 28, 13 in the evenings |
300 |
From 29
until about 6 PM on day 8, then from 17 |
306 |
From 30
on day 8, from 29 on day 9, from 12 on day 10 |
309 |
Quickly
traveling between towers 7, 11, 29, 21, 22. |
360 |
Mostly
near tower 30 on day 8, near 28 on day 9 and |
397 |
Traveling
from tower 20 to 3 through 29 on days 8 and 9. |
A part of the
procedure of gathering this information is captured in this video.
We have a weak suspicion
that tower 30 is not where the map indicates, but somewhere near 28 and 29.
However, we were unable to find strong enough evidence that would have allowed
us to make such an important modification in the data.